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Abstract —  Principal  Component  Analysis  (PC A)  has  been 
widely  used  as  a  data  reduction  technique  to  overcome  the  curse 
of  dimensionality.  In  this  research  we  show  a  different  use  for 
PCA  technique  as  a  tool  for  data  fusion.  PCA  as  a  data  fusion 
technique  is  performed  over  the  Multiangle  Imaging 
Spectroradiometer  (MISR)  data,  studying  dust  storms  to  better 
serve  their  identification.  The  multi-angle  viewing  capability  of 
MISR  is  used  to  enhance  our  understanding  of  the  Earth’s 
environment  that  includes  climate  particularly  of  atmosphere 
and  of  land  surfaces.  In  this  research  the  multi  angle  MISR 
images  clearly  show  a  dust  storm  over  the  Liaoning  region  of 
China  as  well  as  parts  of  northern  and  western  Korea  on  April  8, 
2002.  PCA  is  used  to  combine  the  obtained  information  from  the 
different  angle  views  and  frequency  bands  of  MISR  datasets. 
Performing  K-means  clustering  on  the  original  and  the 
assimilated  products  apply  a  quantitative  measure  that  is 
introduced.  Upon  classifying  the  first  4  principal  components 
(PCs)  having  95%  of  the  information  content  similar  results  were 
obtained  as  compared  to  the  classification  using  original  datasets. 
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I.  Introduction 

Dust  storms  are  a  significant  air  pollution  contributor 
impacting  urban  areas  as  well  as  a  health  hazard  for  people 
with  respiratory  problems  [1],  [2].  Hence,  timely  warnings  of 
dust  storms  must  be  initiated  in  populated  regions  for  health 
concerns  and  traffic  control.  Storms  can  travel  over  large 
parts  of  the  Earth,  in  Asia,  Africa,  affecting  even  North 
America  and  Europe.  As  an  environmental  related 
phenomenon,  dust  storms  have  increased  in  East  Asia  regions 
over  the  last  decade.  Such  increase  is  attributed  to  the  massive 
deforestation  and  increased  droughts.  As  a  climate  related 
phenomenon,  dust  storms  participate  in  modifying  the  energy 
budget  through  cooling  and  heating  the  atmosphere  [3],  [4], 
[5].  Dust  storm  detection  and  tracking  could  be  difficult  as 
they  share  some  similar  characteristics  to  clouds.  Dust  storms 
can  vary  in  their  shape,  particle  size,  and  distribution;  hence 
normally  show  a  varying  behavior. 

PCA  is  a  linear  transformation  of  a  multivariate 
dataset  into  a  new  coordinate  system.  In  remote  sensing 
applications,  the  multiple  variables  are  typically  the  different 
bands  of  a  multispectral  or  hyperspectral  data.  PCA  have  been 


widely  used  as  a  dimension  reduction  technique.  It  has  the 
ability  to  reduce  the  dimensionality  of  a  dataset  while 
retaining  most  of  the  variance  by  concentrating  the  majority  of 
the  information  into  the  first  few  components  [6],  [7],  [8].  In 
this  research  we  have  used  PCA  as  a  data  fusion  tool  of  the 
multi  angular  based  MISR  observations.  This  technique  is 
demonstrated  on  a  large  dust  plume,  which  was  observed  over 
Liaoning  region  of  China  on  April  8,  2002,  as  a  case  study 
shown  in  Fig.  1 


Figure  1.  Different  angle  views  of  a  large  dust  plume  on  April  8,  2002  over 
Liaoning  region  of  China,  parts  of  northern  and  western  Korea 

These  multi-angle  measurements  can  provide  more 
information  than  traditional  single  angle  remote  sensing 
measurements,  thus  can  enhance  the  fine  discrimination 
between  different  materials. 

In  that  respect  we  see  our  fusion  method  as  a  tool  of 
combining  two  or  more  different  images  to  form  a  new  image, 
which  aims  at  obtaining  information  of  greater  quality.  Data 
fusion  allows  formalizing  the  combination  of  this  information, 
as  well  as  it  monitors  the  quality  of  information  in  the  course 
of  the  fusion  process  [9],  [10].  Data  fusion  usually  takes  place 
at  three  different  levels:  pixel,  feature,  and  decision  [11],  [12]. 
In  this  work,  we  are  using  the  feature  information  revealed  by 
the  dust  event  for  the  fusion  process.  The  feature  refers  to  the 
GIS  which  helps  in  classifying  multispectral  images  provided 
by  several  sensors,  in  our  case  cameras. 

Using  the  MISR  capability  of  different  viewing 
angles,  identification  of  dust  storms  can  be  greatly  improved 
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[13],  [14].  For  example,  dust  storm  events,  which  are  difficult 
to  be  detected  by  nadir  viewing  may  be  easily  detected  by  off- 
nadir,  angle  views,  because  off-nadir  sensors  view  thicker 
depth  of  the  atmosphere.  MISR  has  the  potential  to  enhance 
the  detection  of  small  dust  storms,  thus  it  might  be  helpful  in 
early  detection  of  dust  storms.  It  has  been  shown  that  MISR 
can  be  used  to  detect  large  dust  storms  like  the  one  over 
northwestern  part  of  India  [13],  [14]. 

In  the  current  research  we  are  focusing  on  integrating 
the  spatial  and  the  spectral  information  rendered  by  the 
different  viewing  angles  and  the  different  frequencies.  PC  A 
serves  as  the  data  fusion  tool  for  such  data  integration.  The 
used  data  is  comprised  of  4  different  frequencies  being  served 
by  5  different  angular  observations.  The  data  fusion  is 
performed  using  PCA  in  two  different  ways,  once  by  fixing 
the  frequency  component  and  once  by  fixing  the  angular 
component.  Combining  such  information  could  be  useful  in 
discriminating  between  dust  clouds  and  regular  clouds.  It 
could  be  beneficial  in  decreasing  the  background  effects  for 
desert  regions  by  selecting  suitable  viewing  angles  as  well. 


II.  ANALYSIS  and  discussion 

Using  PCA  for  data  fusion  we  performed  three 
different  experiments  using  various  angular  and  frequency 
combinations.  Quantitative  analysis  is  carried  out  for  each 
experiment  output  to  compare  the  obtained  classes  with  those 
obtained  from  the  original  dataset.  The  computational 
complexity  of  these  processes  is  not  the  theme  of  our  current 
research.  However,  it  will  be  discussed  thoroughly  in  future 
publication. 

A.  Experimental  Layout 

In  the  first  experiment,  see  Fig.2,  we  used  all  the 
bands  rendered  by  the  different  angle  images  in  form  of  a 
multi  dimensional  product  with  20  bands  and  performed  the 
PCA.  The  first  eigenvalue  out  of  the  20  obtained  components 
contain  about  93.9  percent  of  the  total  data  variation 
(information  content).  The  classification  results  were 
produced  using  the  first  few  eigenvalues  for  the  representative 
set. 


95%  Information 

Figure  2.  Experiment  one  layout  showing  the  methodology  used  for 
comparing  classification  results  from  PCA  based  fused  data  to  Original 


In  the  second  experiment  we  have  first  fused 
information  from  different  angles  for  one  particular  frequency. 
The  first  eigenvalue  of  each  frequency  contains  approximately 
94.2  percent  of  the  total  data  variance.  These  first  components 
with  information  from  different  cameras  for  each  frequency 
are  then  fused  as  shown  in  Fig.  3,  to  produce  a  result 
comparable  to  first  experiment. 


Figure  3.  Experiment  two  layout  showing  the  frequency  based 
methodology  used  for  comparing  classification  results  from  PCA  based 
fused  data  to  Original  data. 


In  the  third  experiment  we  have  first  fused 
information  from  different  frequencies  for  each  individual 
camera.  The  first  eigenvalue  of  each  camera  contains 
approximately  99  percent  of  the  total  data  variance.  These 
first  components  with  information  from  different  frequencies 
for  each  camera  are  then  fused  as  shown  in  Fig.  4,  to  produce 
a  result  comparable  to  first  and  second  experiments. 
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Figure  4.  Experiment  three  layout  showing  the  camera  based  methodology 
used  for  comparing  classification  results  from  PCA  based  fused  data  to 
Original  data. 


From  the  above  three  layouts,  MISR  served  as  the 
one  sensor  that  can  provide  different  spatial  information 
content  over  the  visible  and  IR  spectrum.  This  unique 
property  is  revealed  from  the  fact  of  availability  of  the  multi¬ 
camera  observations. 


B.  Principal  Components 

The  principal  component  computation  involves 
computing  the  covariance  matrix  of  the  data,  computing  its 
eigenvalues  and  eigenvectors,  plus  additional  work  to  form  the 
principal  components.  In  PC  A,  the  components  can  be 
arranged  or  produced  in  a  descending  order  of  variance  or 
information  content.  Most  of  the  information  contained  by  the 
data  can  be  found  in  the  first  few  principal  components.  Table 
I.  below  shows  the  percent  information  contained  by  the  first 
principal  components  obtained  from  the  performed 
experiments. 


TABLE  I.  Comparison  of  the  First  Principal  Component’s 
Information  Content  obtained  from  the  three  experiments. 


Experiments 

Eigen 

Values 

X  109 

Sum  Eigen 
Values 

X  109 

Percent 

Information 

Content 

Percent 

Variation 

ALL 

Cam+Freq 

10.732 

1 1 .423 

0.939 

93.954 

Frequency 

Based 

IR 

2.574 

2.731 

0.942 

94.255 

Red 

2.657 

2.822 

0.941 

94.155 

Green 

2.758 

2.921 

0.944 

94.423 

Blue 

2.789 

2.948 

0.946 

94.613 

Camera 

Based 

NADIR 

2.698 

2.706 

0.996 

99.694 

Camera  A 

2.253 

2.269 

0.993 

99.320 

Camera  B 

2.179 

2.199 

0.990 

99.092 

Camera  C 

2.127 

2.149 

0.989 

98.987 

Camera  D 

2.076 

2.098 

0.989 

98.914 

It  is  shown  that  the  first  eigenvalue  for  each 
experiment  contains  more  than  90  percent  of  the  total  data 
variation  (i.e.  of  the  information)  present  in  the  original  data 
set.  With  this,  we  can  say  that  the  intrinsic  dimensionality  of 
this  dataset  is  effectively  1 ,  given  that  we  are  only  interested  in 
90  percent  of  the  information  content  (data  variation).  The 
rest  of  the  PCs  contain  total  data  variation  of  less  than  10 
percent  of  all  original  bands. 

The  percent  variation  in  the  camera  based  experiment 
is  much  higher  in  value  than  those  obtained  from  the  first  two 
experiments.  This  can  be  attributed  to  the  fact  that  PC  A 
technique  shows  higher  efficiency  on  spectral  domain  as 
compared  to  spatial  domain.  The  first  experiment  showed  the 
least  percent  variation  as  expected,  since  it  tries  fusing  data 
from  spatial  and  spectral  domains  and  hence,  there  is  no 
specific  theme  for  the  fusion  process. 

C.  K-means  Clustering 

To  examine  the  three  PC  A  based  fusion  layouts 
described  above,  a  quantitative  analysis  is  required.  In  our 
case,  we  use  k-means  clustering  to  compare  the  three  fused 
outputs  with  the  original  data.  K-means  algorithm  makes  use 
of  the  spatial  and  the  spectral  information  from  the  image 
under  study.  It  creates  clusters  of  discrete  classes  having  sets 
of  similar  objects.  A  good  clustering  result  is  the  one  where 


the  objects  in  the  same  class  are  more  or  less  alike,  and  objects 
in  different  classes  are  in  some  sense  different  [15]. 

This  technique  was  performed  four  times,  on  the 
original  data  and  with  the  three  other  experiments.  Each  time 
we  run  that  algorithm,  the  same  settings  are  used  as  shown  in 
table  II. 

TABLE  II.  K-means  classification  settings  used  for  the  three  experiments. 


Number  of  classes 

5 

Change  threshold 

8% 

Maximum  iteration 

15 

On  performing  clustering  of  the  original  and  the  assimilated 
data  products,  similar  classification  results  were  observed  as 
expected  shown  in  Fig.  5. 
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Figure  5.  Kmeans  clustering  obtained  from  the  original  and  the  PC  A  fused 
data  from  the  three  experiments. 


Such  result  coherence  manifests  the  fact  the  PCA  has 
well  performed  as  a  fusion  tool.  This  is  because  of  the  fact 
that  results  that  can  be  obtained  from  the  original  dataset  with 
a  certain  dimensionality  x  can  also  be  obtained  from  the 
assimilated  out  puts  with  dimensionality  y,  where  x>y.  The 
main  differences  in  the  outputs  will  be  discussed  in  a  future 
research  dealing  with  computational  and  time  complexity.  It 
is  also  worthwhile  to  notice  that  clustering  revealed  some 
limitations.  This  is  due  to  the  fact  it  is  not  able  to  cluster  well 
the  original  dataset  having  100%  of  the  information  as 
compared  to  the  PCA  outputs  having  from  93-99  %  of  the 
information.  However,  in  the  latter  case  it  is  way  faster  than 
using  the  whole  original  dataset  as  will  be  discussed  later. 


III.  CONCLUSIONS 

In  this  research,  we  introduced  PCA  as  a  tool  for  data  fusion. 
It  has  been  shown  from  the  different  layouts  that  PCA  can  fuse 
more  information  based  on  the  way  it  is  performed.  The  First 
PC  from  the  camera  based  gave  the  highest  information  since 
it  tries  to  fuse  spectral  information.  Whereas,  the  first  PC 
from  the  frequency  based  fuses  spatial  information  and  hence 
showed  lower  variance.  In  the  case  where  fusion  was  done  on 
all  the  bands,  the  least  values  were  obtained.  The 
classification  results  show  that  the  three  cases  produce  similar 
classification  accuracy  for  dust  storm  event.  Hence,  PCA  can 
be  used  as  an  effective  method  to  fuse  information.  Using 
PCA  in  parts  helps  in  better  fusing  the  information  from 
various  data  sources. 

IV.  Future  Work 

Since  PCA  is  a  global  operation  it  will  be  worthwhile  to  study 
the  computational  efficiency  and  time  complexity  of  the  above 
mentioned  experimental  layouts.  It  will  be  also  good  to 
investigate  the  obtained  clustering  outputs  corresponding  to 
each  PCA  experimental  design.  This  analysis  will  reveal  the 
most  optimum  layout  for  using  PCA  as  a  data  fusion  tool. 
Looking  at  the  other  obtained  PCs  besides  the  first  few  can 
help  us  to  discriminate  the  dust  events  from  the  surrounding 
noisy  elements,  like  clouds. 
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